AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Megiddo, Nimrod, Wasserkrug, Segev, Davidovich, Orit, Shtern, Shimrit

Finding Probably Approximate Optimal Solutions by Training to Estimate the Optimal Values of Subproblems

arXiv.org Artificial IntelligenceNov-5-2025

The paper is about developing a solver for maximizing a real-valued function of binary variables. The solver relies on an algorithm that estimates the optimal objective-function value of instances from the underlying distribution of objectives and their respective sub-instances. The training of the estimator is based on an inequality that facilitates the use of the expected total deviation from optimality conditions as a loss function rather than the objective-function itself. Thus, it does not calculate values of policies, nor does it rely on solved instances.

artificial intelligence, machine learning, optimization problem, (15 more...)

2511.02048

Country: Asia > Middle East > Israel (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.65)

Neural Information Processing SystemsAug-15-2025, 00:47:30 GMT

89562dccfeb1d0394b9ae7e09544dc70-Supplemental.pdf

equation, particle, vector field, (16 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Neural Information Processing SystemsOct-3-2024, 22:20:59 GMT

Regret Minimization in MDPs with Options without Prior Knowledge

Recent works leveraged the mapping of Markov decision processes (MDPs) with options to semi-MDPs (SMDPs) and introduced SMDP-versions of exploration-exploitation algorithms (e.g.,

algorithm, confidence interval, temporal abstraction, (12 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Wan, Yi, Yu, Huizhen, Sutton, Richard S.

On Convergence of Average-Reward Q-Learning in Weakly Communicating Markov Decision Processes

arXiv.org Artificial IntelligenceAug-29-2024

This paper analyzes reinforcement learning (RL) algorithms for Markov decision processes (MDPs) under the average-reward criterion. We focus on Q-learning algorithms based on relative value iteration (RVI), which are model-free stochastic analogues of the classical RVI method for average-reward MDPs. These algorithms have low per-iteration complexity, making them well-suited for large state space problems. We extend the almost-sure convergence analysis of RVI Q-learning algorithms developed by Abounadi, Bertsekas, and Borkar (2001) from unichain to weakly communicating MDPs. This extension is important both practically and theoretically: weakly communicating MDPs cover a much broader range of applications compared to unichain MDPs, and their optimality equations have a richer solution structure (with multiple degrees of freedom), introducing additional complexity in proving algorithmic convergence. We also characterize the sets to which RVI Q-learning algorithms converge, showing that they are compact, connected, potentially nonconvex, and comprised of solutions to the average-reward optimality equation, with exactly one less degree of freedom than the general solution set of this equation. Furthermore, we extend our analysis to two RVI-based hierarchical average-reward RL algorithms using the options framework, proving their almost-sure convergence and characterizing their sets of convergence under the assumption that the underlying semi-Markov decision process is weakly communicating.

algorithm, equation, optimality equation, (16 more...)

2408.16262

Country:

North America > United States > New York (0.04)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

arXiv.org Artificial IntelligenceMay-23-2024

Deterministic Policies for Constrained Reinforcement Learning in Polynomial-Time

McMahan, Jeremy

Constrained Reinforcement Learning (CRL) traditionally produces stochastic, expectationconstrained policies that can behave undesirably - imagine a self-driving car that randomly changes lanes or runs out of fuel. However, artificial decision-making systems must be predictable, trustworthy, and robust. One approach to ensuring these qualities is to focus on deterministic policies, which are inherently predictable and trustworthy. Moreover, they are easy to implement [10], reliable for autonomous vehicles [16, 12], and effective for multi-agent coordination [23]. Similarly, almost sure and anytime constraints [21] provide inherent trustworthiness and robustness, essential for applications in medicine [6, 22, 18], disaster relief [9, 29, 27], and resource management [20, 19, 24, 4]. Despite the advantages of deterministic policies and stricter constraints, their computation remains an open challenge in CRL. Our research aims to address this challenge by studying the computational complexity of computing deterministic policies for a wide range of constraint types. Consider a constrained Markov Decision Process (cMDP) denoted by M. Let C represent an arbitrary cost criterion and B be the available budget.

algorithm, constraint, induction hypothesis, (13 more...)

2405.14183

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany (0.04)

Genre: Research Report (0.40)

Industry: Transportation (0.86)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Neural Information Processing SystemsMar-14-2024, 17:43:33 GMT

Robustness and risk-sensitivity in Markov decision processes

We uncover relations between robust MDPs and risk-sensitive MDPs. The objective of a robust MDP is to minimize a function, such as the expectation of cumulative cost, for the worst case when the parameters have uncertainties. The objective of a risk-sensitive MDP is to minimize a risk measure of the cumulative cost when the parameters are known. We show that a risk-sensitive MDP of minimizing the expected exponential utility is equivalent to a robust MDP of minimizing the worst-case expectation with a penalty for the deviation of the uncertain parameters from their nominal values, which is measured with the Kullback-Leibler divergence. We also show that a risk-sensitive MDP of minimizing an iterated risk measure that is composed of certain coherent risk measures is equivalent to a robust MDP of minimizing the worst-case expectation when the possible deviations of uncertain parameters from their nominal values are characterized with a concave function.

risk measure, risk-sensitive mdp, robust mdp, (13 more...)

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > New Jersey > Hudson County > Hoboken (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Germany > Berlin (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.42)

arXiv.org Artificial IntelligenceApr-19-2023

The In-Sample Softmax for Offline Reinforcement Learning

Xiao, Chenjun, Wang, Han, Pan, Yangchen, White, Adam, White, Martha

Reinforcement learning (RL) agents can leverage batches of previously collected data to extract a reasonable control policy. An emerging issue in this offline RL setting, however, is that the bootstrapping update underlying many of our methods suffers from insufficient action-coverage: standard max operator may select a maximal action that has not been seen in the dataset. Bootstrapping from these inaccurate values can lead to overestimation and even divergence. There are a growing number of methods that attempt to approximate an in-sample max, that only uses actions well-covered by the dataset. We highlight a simple fact: it is more straightforward to approximate an in-sample softmax using only actions in the dataset. We show that policy iteration based on the in-sample softmax converges, and that for decreasing temperatures it approaches the in-sample max. We derive an In-Sample Actor-Critic (AC), using this in-sample softmax, and show that it is consistently better or comparable to existing offline RL methods, and is also wellsuited to fine-tuning. We release the code at github.com/hwang-ua/inac A common goal in reinforcement learning (RL) is to learn a control policy from data. In the offline setting, the agent has access to a batch of previously collected data. This data could have been gathered under a near-optimal behavior policy, from a mediocre policy, or a mixture of different policies (perhaps produced by several human operators). A key challenge is to be robust to this data gathering distribution, since we often do not have control over data collection in some application settings.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

2302.14372

Country:

North America > Canada > Alberta (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Abouheaf, Mohammed, Gueaieb, Wail, Spinello, Davide, Al-Sharhan, Salah

A Data-Driven Model-Reference Adaptive Control Approach Based on Reinforcement Learning

arXiv.org Artificial IntelligenceMar-17-2023

Model-reference adaptive systems refer to a consortium of techniques that guide plants to track desired reference trajectories. Approaches based on theories like Lyapunov, sliding surfaces, and backstepping are typically employed to advise adaptive control strategies. The resulting solutions are often challenged by the complexity of the reference model and those of the derived control strategies. Additionally, the explicit dependence of the control strategies on the process dynamics and reference dynamical models may contribute in degrading their efficiency in the face of uncertain or unknown dynamics. A model-reference adaptive solution is developed here for autonomous systems where it solves the Hamilton-Jacobi-Bellman equation of an error-based structure. The proposed approach describes the process with an integral temporal difference equation and solves it using an integral reinforcement learning mechanism. This is done in real-time without knowing or employing the dynamics of either the process or reference model in the control strategies. A class of aircraft is adopted to validate the proposed technique.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

doi: 10.1109/ROSE52750.2021.9611772

2303.09994

Country:

North America > United States > Massachusetts (0.04)
North America > United States > Washington > King County > Auburn (0.04)
North America > Canada > Ontario > National Capital Region > Ottawa (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Wan, Yi, Sutton, Richard S.

On Convergence of Average-Reward Off-Policy Control Algorithms in Weakly Communicating MDPs

arXiv.org Artificial IntelligenceNov-5-2022

We show two average-reward off-policy control algorithms, Differential Q-learning (Wan, Naik, & Sutton 2021a) and RVI Q-learning (Abounadi Bertsekas & Borkar 2001), converge in weakly communicating MDPs. Weakly communicating MDPs are the most general MDPs that can be solved by a learning algorithm with a single stream of experience. The original convergence proofs of the two algorithms require that the solution set of the average-reward optimality equation only has one degree of freedom, which is not necessarily true for weakly communicating MDPs. To the best of our knowledge, our results are the first showing average-reward off-policy control algorithms converge in weakly communicating MDPs. As a direct extension, we show that average-reward options algorithms for temporal abstraction introduced by Wan, Naik, & Sutton (2021b) converge if the Semi-MDP induced by options is weakly communicating.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2209.15141

Country: North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.14)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)